Skip to content

obs(store): track MVCC write conflicts by key-prefix class#587

Merged
bootjp merged 3 commits intomainfrom
obs/store-write-conflict-metric
Apr 22, 2026
Merged

obs(store): track MVCC write conflicts by key-prefix class#587
bootjp merged 3 commits intomainfrom
obs/store-write-conflict-metric

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 22, 2026

Summary

  • Adds a protocol-independent OCC write-conflict metric at the MVCC store layer. Both checkConflicts (write-write) and checkReadConflicts (read-write) in store/lsm_store.go / store/mvcc_store.go now increment a bounded per-(kind, key_prefix) counter immediately before returning NewWriteConflictError; a new monitoring.WriteConflictCollector polls the stores every 5 s (matching the existing DispatchCollector / PebbleCollector cadence) and mirrors deltas into elastickv_store_write_conflict_total{group, kind, key_prefix}.
  • kind splits read (RW conflict via read set) from write (WW conflict via mutation set) since ops implications differ. key_prefix uses a bounded classification aligned with existing prefix constants (!txn|lock|, !txn|rb|, !redis|str|, !hs|, !zs|, !lst|, !ddb|, ...) with an other fallback so user-supplied keys cannot grow cardinality.

Motivation

#585 adds the proxy-side view (per client request). This PR adds the underlying store-side view (per Raft-applied proposal), so the signal is visible regardless of protocol adapter (Redis, DynamoDB, raw KV). In the !txn|rb| production incident the proxy counter spiked per user request but a single request fanned out to many Raft proposals; the store-side metric would have surfaced the real pressure (txn_rollback bucket) directly.

Sample Grafana query

sum by (key_prefix) (rate(elastickv_store_write_conflict_total[5m]))

Split by conflict kind:

sum by (kind, key_prefix) (rate(elastickv_store_write_conflict_total[5m]))

Test plan

  • go test -race -count=1 ./store/... ./monitoring/... ./adapter/... ./kv/...
  • make lint
  • New unit tests:
    • store/write_conflict_counter_test.go: table test for classifyWriteConflictKey covering every bucket; counter record / snapshot / nil-safety.
    • store/write_conflict_store_test.go: integration tests for both pebble-backed and in-memory MVCC stores exercising the WW and RW conflict paths and asserting the right bucket increments.
    • monitoring/write_conflict_test.go: collector delta behaviour, source-reset (counter-down) rebasing, malformed-label skip, nil-safety.

Adds a protocol-independent OCC conflict signal at the store layer:
checkConflicts / checkReadConflicts in the Pebble and in-memory MVCC
stores now increment a per-(kind, key_prefix) counter immediately
before returning NewWriteConflictError. A new WriteConflictCollector
polls each group's store every 5s (matching DispatchCollector and
PebbleCollector cadence) and mirrors deltas into
elastickv_store_write_conflict_total{group, kind, key_prefix}.

Motivation: the proxy-side metric counts per client request; this
store-side metric counts per Raft-applied proposal, which is the
right signal for capacity and lock-resolver races (e.g. the txn
rollback incident pattern). The metric is visible whether the
consumer is Redis, DynamoDB, or raw KV.

The kind label splits read-write (checkReadConflicts) from
write-write (checkConflicts) conflicts since the ops implications
differ; key_prefix uses a bounded classification aligned with the
existing txn/redis/hash/set/zset/list/ddb prefixes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 46 minutes and 47 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 46 minutes and 47 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13b508f0-d0c7-4397-9f6b-b279693cc905

📥 Commits

Reviewing files that changed from the base of the PR and between 6023aff and ceb40f8.

📒 Files selected for processing (12)
  • kv/leader_routed_store.go
  • kv/shard_store.go
  • main.go
  • monitoring/registry.go
  • monitoring/write_conflict.go
  • monitoring/write_conflict_test.go
  • store/lsm_store.go
  • store/mvcc_store.go
  • store/store.go
  • store/write_conflict_counter.go
  • store/write_conflict_counter_test.go
  • store/write_conflict_store_test.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch obs/store-write-conflict-metric

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a monitoring system to track Optimistic Concurrency Control (OCC) conflicts at the MVCC store layer, adding a Prometheus collector and key-prefix classification logic across the store and monitoring packages. Reviewer feedback suggests that these conflict checks are redundant at the store layer because the leader is responsible for issuing consistent timestamps, and recommends removing the logic to align with the repository's concurrency model.

Comment thread store/lsm_store.go
// failing ApplyMutations corresponds to one OCC conflict
// regardless of how many subsequent mutations would also
// collide, which matches the Prometheus rate semantics.
s.writeConflicts.record(WriteConflictKindWrite, classifyWriteConflictKey(mut.Key))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer because the leader is responsible for issuing consistent timestamps. Instead of refactoring this logic, it should be removed to align with the repository's concurrency model.

References
  1. Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer because the leader is responsible for issuing consistent timestamps, which is expected to prevent write conflicts.

Comment thread store/mvcc_store.go
func (s *mvccStore) checkConflictsLocked(mutations []*KVPairMutation, readKeys [][]byte, startTS uint64) error {
for _, mut := range mutations {
if latestVer, ok := s.latestVersionLocked(mut.Key); ok && latestVer.TS > startTS {
s.writeConflicts.record(WriteConflictKindWrite, classifyWriteConflictKey(mut.Key))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Conflict checks at the store layer are unnecessary as the leader ensures consistent timestamps. Following the repository's architectural rules, this logic should be removed rather than centralized.

References
  1. Optimistic Concurrency Control (OCC) conflict checks are not required at the store layer because the leader is responsible for issuing consistent timestamps, which is expected to prevent write conflicts.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Apr 22, 2026

@gemini-code-assist Both medium-priority comments (store/lsm_store.go:872 and store/mvcc_store.go:502) appear to misread the diff. Clarifying with evidence so you can re-evaluate.

This PR does not add checkConflicts / checkReadConflicts

Both functions are pre-existing. The PR only adds a counter increment alongside the existing NewWriteConflictError return. From git show 9a13db4c -- store/lsm_store.go store/mvcc_store.go:

 		if exists && ts > startTS {
+			s.writeConflicts.record(WriteConflictKindWrite, classifyWriteConflictKey(mut.Key))
 			return NewWriteConflictError(mut.Key)
 		}
 		if latestVer, ok := s.latestVersionLocked(mut.Key); ok && latestVer.TS > startTS {
+			s.writeConflicts.record(WriteConflictKindWrite, classifyWriteConflictKey(mut.Key))
 			return NewWriteConflictError(mut.Key)
 		}

No new OCC check is introduced. This is an observability PR: it surfaces the conflict count per (kind, key_prefix) for Prometheus. Removing the OCC check was never on the table here, and doing so would be a correctness regression.

The AGENTS.md rule is about timestamp issuance, not disabling OCC

AGENTS.md:35 (the only rule matching your wording):

Avoid leader-local timestamps in persistence; timestamp issuance should originate from the Raft leader to prevent skewed OCC decisions.

Plain reading: the leader must issue timestamps (HLC/TSO) so OCC decisions use a consistent clock. It does not say "OCC checks at the store layer are redundant." In fact it presupposes that OCC decisions happen — the rule exists to make them correct. HLC and OCC are orthogonal: HLC makes startTS/commitTS globally monotonic; OCC enforces commitTS > last write per key and >= all reads. Two writers sharing a startTS can still collide on a key even with perfect HLC; OCC rejects the loser.

docs/collection_metadata_delta_design.md:90 even spells it out: Raft leader uniqueness guarantees single-source commitTS, and OCC on field/member keys is what provides mutual exclusion.

Load-bearing tests that would break if OCC were removed

  • store/lsm_store_txn_test.go:101 TestPebbleStore_ApplyMutations_WriteConflict — direct write-write conflict.
  • store/lsm_store_txn_test.go:472 TestPebbleStore_ApplyMutations_ReadConflict — direct read-write conflict via checkReadConflicts.
  • kv/fsm_occ_test.go:75 TestOnePhaseTxnDetectsWriteConflict and kv/fsm_occ_test.go:110 TestOnePhaseTxnDetectsReadWriteConflictViaReadKeys — FSM-level OCC.
  • kv/fsm_abort_test.go:306 TestFSMAbort_SecondAbortSameKeysIsIdempotent — abort-path semantics that depend on the store-level conflict check being honored.

Offer

If the underlying concern is coupling between the hot OCC path and the observability counter, I am happy to move the record(...) call into a small wrapper near the NewWriteConflictError return site so OCC logic stays visually isolated. Happy to do that as a follow-up if you confirm.

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a protocol-independent MVCC store–layer metric for OCC write conflicts, bucketed by conflict kind (read/write) and a bounded key-prefix classification, and wires a new collector into node monitoring to export elastickv_store_write_conflict_total{group,kind,key_prefix}.

Changes:

  • Add per-store write-conflict counters and key-prefix classification, incremented on WW and RW conflict paths immediately before returning NewWriteConflictError.
  • Introduce monitoring.WriteConflictCollector and registry plumbing to poll stores every 5s and export conflict deltas to Prometheus.
  • Extend store.MVCCStore (and wrappers) with WriteConflictCountsByPrefix() plus comprehensive unit/integration tests.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
store/write_conflict_counter.go New bounded key-prefix classifier + mutex-protected per-bucket counters and label encode/decode helpers.
store/store.go Extends MVCCStore interface with WriteConflictCountsByPrefix() snapshot API.
store/mvcc_store.go Adds in-memory store conflict counter recording + snapshot accessors.
store/lsm_store.go Adds Pebble store conflict counter recording in WW/RW conflict checks + snapshot accessors.
kv/shard_store.go Aggregates shard-group conflict snapshots into a node-wide snapshot for ShardStore.
kv/leader_routed_store.go Delegates conflict snapshot access to the local store with nil-safety.
monitoring/write_conflict.go New Prometheus metric + polling collector that emits positive deltas with reset rebasing.
monitoring/registry.go Registers write-conflict metrics and exposes WriteConflictCollector().
main.go Factors monitoring startup into helper and wires in the new write-conflict collector sources.
store/write_conflict_counter_test.go Unit tests for classification, counter record/snapshot behavior, and label decoding.
store/write_conflict_store_test.go Integration tests asserting WW/RW conflict paths increment the expected buckets (Pebble + in-memory).
monitoring/write_conflict_test.go Collector tests for deltas, reset rebasing, malformed label skipping, and nil-safety.

Comment thread monitoring/write_conflict_test.go Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@bootjp bootjp enabled auto-merge April 22, 2026 15:56
@bootjp bootjp merged commit b95750d into main Apr 22, 2026
8 checks passed
@bootjp bootjp deleted the obs/store-write-conflict-metric branch April 22, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants